Questions

Answer the following questions in a PDF document:

CPU Configuration

(1.5 points) What is the associativity, cache line size and capacity for the L1 Data cache used by two_level.py?

tip
You will need to find the default values used by gem5 when running the script.
(1.5 points) What are the number of bits for the tag, index and offset respectively for the L1 Data cache assuming 32 bit addressing?
(1 points) Why is it necessary to change from the TimingSimpleCPU for the attack to be successful?
(3 points) What change(s) can you make to the parameters of the caches in configs/learning_gem5/part1/caches.py that will result in the attack failing? Explain why your changes mitigated the attack.

note

The attack fails if instead of discovering a letter the program outputs 'Unclear'. For example: Reading at malicious_x = 0xffffffffffdfebb8... Unclear

`spectre.c`

(1.5 points) What does the intrinsic __mm_clflush do? Explain the purpose of the calls to it on lines 66 and 71:

spectre.c
for (i = 0; i < 256; i++)
 results[i] = 0;
for (tries = 999; tries > 0; tries--) {

/* Flush array2[256*(0..255)] from cache */
for (i = 0; i < 256; i++)
  _mm_clflush( & array2[i * 512]); /* intrinsic for clflush instruction */

/* 30 loops: 5 training runs (x=training_x) per attack run (x=malicious_x) */
training_x = tries % array1_size;
for (j = 29; j >= 0; j--) {
  _mm_clflush( & array1_size);
  for (volatile int z = 0; z < 100; z++) {} /* Delay (can also mfence) */

(2 points) Which cachelines are 'numbered' to communicate a value? Justify your answer.
(2 points) Why is victim_function called with the sequence of 5 in-bounds value and then an out of bounds value?
(1 point ) Why is malicious_x initialized to (size_t)(secret - (char * ) array1)?

(2 points) What is the purpose of the loop from lines 86-94? How does it achieve this purpose?

spectre.c
/* Time reads. Order is lightly mixed up to prevent stride prediction */
for (i = 0; i < 256; i++) {
  mix_i = ((i * 167) + 13) & 255;
  addr = & array2[mix_i * 512];
  time1 = __rdtscp( & junk); /* READ TIMER */
  junk = * addr; /* MEMORY ACCESS TO TIME */
  time2 = __rdtscp( & junk) - time1; /* READ TIMER & COMPUTE ELAPSED TIME */
  if (time2 <= CACHE_HIT_THRESHOLD && mix_i != array1[tries % array1_size])
    results[mix_i]++; /* cache hit - add +1 to score for this value */
}

(1.5 points) array2 is defined and accessed with a very particular constant 512 as can be seen on lines 35, 43, 66, and 88.
- Why would the attack fail if that constant was changed from 512 to 2 instead?
- Why was the constant 512 chosen by the author of the code?

Log-File and disassembly

(3 points) Annotate the instructions in vuln.s from the function victim_function for the following instructions:
- Place the comment #loads array_size1 on the same line as the relevant instruction.
- Place the comment #loads array1[x]
- Place the comment #loads array2[array1[x] * 512
(4 points) Provide the address taken from pipeview-spec.out and justification for the following instructions:
- Which instruction has a cache miss?
- Which instruction is mispredicted?
- Which instruction in vuln.s loads the secret value into a register? Is this instruction squashed?
- Which instruction perturbs the cache using the secret value?

Overview

(2 points) How would the implementation of __mm_clflush differ between a direct mapped cache and a fully associative cache? Explain in high-level terms.
(4 points) How would you change the design of an out-of-order CPU to fix this vulnerability?
- How would your change impact the performance of the CPU?
- Explain how your suggested change would fix the issue.
(3 points) The Side-channels section above explained the flush + reload side channel. This side channel operates by evicting specific addresses and then attempting to access those same addresses to detect a value. Another side channel is known as Prime + Probe. Prime + probe operates by detecting contention in cache sets. In this side channel the receiver will prime the cache by numbering cache sets and then completely filling them with receiver known addresses. Then the sender will perform an access which maps to one of the numbered cache sets which evicts one of the lines placed there by the receiver. Finally the receiver can probe the cache to determine which set was accessed by observing which access was a miss.

Assume that in the following example Sender and Receiver are transmitting the value 127 through a 4-way set associative cache using Prime + Probe:
- Sender and Receiver agree on cache set numbering 0-255.
- Receiver reads 4 addresses that map to cache set 0. Receiver repeats this to fill all 256 cache sets in turn.
- Sender reads 1 address that maps to cache set 127. This will evict 1 of Receiver's values in set 127.
- Receiver reads all 1024 addresses it originally brought into the cache. Receiver notices a miss occured in set 127.
- Receiver gets value 127
Would Flush + Reload and Prime + Probe both transmit data in a fully associative cache? Justify your answer.

(1 point) Partner Evaluation Please complete the confidential partner evaluation for your partner. If you both complete the evaluation, you get a point.

CPU Configuration​

spectre.c​

Log-File and disassembly​

Overview​

CPU Configuration

`spectre.c`

Log-File and disassembly

Overview